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The accurate estimation of scaling exponents is central in the observational study of scale-invariant 
phenomena. Natural systems unavoidably provide observations over restricted intervals; conse- 
quently a stationary stochastic process (time series) can yield anomalous time variation in the 
scaling exponents, suggestive of non-stationarity. The variance in the estimates of scaling exponents 
computed from an interval of N observations is known for finite variance processes to vary as 1/A'' 
as oo for certain statistical estimators; however, the convergence to this behaviour will depend 

on the details of the process, and may be slow. We study the variation in the scaling of second 
order moments of the time series increments with A'^, for a variety of synthetic and 'real world' time 
series; and find that in particular for heavy tailed processes, for realizable A'^, one is far from this 
~ limiting behaviour. We propose a semi-empirical estimate for the minimum A'' needed to 
make a meaningful estimate of the scaling exponents for model stochastic processes and compare 
these with some 'real world' time series. 

PACS numbers: 05.45.Tp, 89.75.Da 



I. INTRODUCTION 

Testing for, and quantifying scaling is central to the 
application of statistical theories to 'real-world' extended 
systems. A broad range of theoretical frameworks such as 
turbulence critical phenomena [2] and complex sys- 
tems 3 frame their predictions in terms of the statistical 
properties of (arbitrarily large) ensembles as a function 
of scale. Under the assumption of ergodicity the statis- 
tical scaling property of an extended system is captured 
to some extent by a reduced (embedded) set of observa- 
tions or measurements; so that a 1-D cut through a N 
dimensional system will be sufficient to indicate whether 
scaling is present, and in a quantitative way can usefully 
restrict the scaling exponents of the system as a whole. 
This approach is pragmatic - in physical systems it is 
generally not practicable to capture and analyze the full 
spatiotemporal information of all points in the system on 
all scales. A key observable is then the quantitative scal- 
ing properties of such a one dimensional sample or time 
series. An example of this is the use of the Taylor hy- 
pothesis in turbulence, where the time series at a single 
point is used as a proxy for the statistical properties of 
the flow [3. 

Time series are also often parsed into sub-intervals to 
isolate processes of interest, or to remove features which 
might contaminate the calculation of the quantity of in- 
terest. Examples of this in the study of solar wind tur- 
bulence are the separating of fast and slow wind, and 
open/closed field line regions 0, @]; isolating or removing 
signals of interplanetary shocks, magnetosheath cross- 



ings, and coronal mass ejection remnants 0, O'" where 
the interval is restricted by a secular change in parame- 
ters as the spacecraft changes location . Examples 
in the study of the earth's geomagnetic field include iso- 
lating 'quasi-stationary' and 'quiet' intervals in magnetic 
field data [ll| ; and the effects of finite sample size in 
the power spectral exponent estimates in the ionosphere 
by ground-based measurements [l2|. 'Locally station- 
ary processes' are also discussed in speech signal analysis 
[Tst and physiological non-stationary signals and of 
course statistical forecasting, whether in the context of 
seasonal weather or the financial markets [lB|, is based 
on time series histories which rely on the stationarity as- 
sumption. In all of these cases, it is intuitively apparent 
that smaller data intervals will result in poorer statistics, 
which will be manifest in the variance of the observed val- 
ues of the exponents. The observation of a (non-secular) 
variation in the scaling exponents therefore has two in- 
terpretations; either it is due to intrinsic fluctuations as 
a result of the finite N interval, or it is a consequence of 
non-time stationarity of the time series x{t) i.e. different 
scaling behaviour due to different physical phenomena. 
If the properties of the underlying process are not known 
a priori we need a method to distinguish these two in- 
terpretations in a quantitative manner; or at best to put 
a degree of confidence that it is due to one and not the 
other. 

The most commonly used tool to both establish and 
quantify scaling in a time series x{t) is to test for scal- 



ing of the power spectral density F{uj) 



^, and ob- 
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tain the exponent /3. In a physical system, such scal- 
ing can only be observed over a finite range, limited by 
the interval (in time t) of N observations over which the 
system is measured. From large-sample theory (asymp- 
totic limit of N 00) the variance in the power spec- 
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tral exponent (3 computed using least squares reg ression 
from an interval of N samples is known [l6l [TtI for fi- 
nite variance processes to vary as ~ as iV oo. 
One method to obtain more complete information about 
the scaling properties of a stochastic process x{t) is cap- 
tured by how the statistical properties of the increments 
y{t, t) = x{t + T) — x{t) vary with the differencing scale r; 
the differencing being a particular type of coarse-graining 
operation which has been chosen due to the easy analogy 
with random walks, return probabilities etc. However, 
there exist other coarse-graining operations which al- 
though more involved, possess additional highly desirable 
properties when studying scaling. In particular, wavelets 
which (with some wavelet functions) when combined with 
their detrending capabilities have been shown to be a 
natural and computationally efhcient way of studying 
scale- by-scale statistical behaviour [l^ [3, [l^. In this 
paper we will discuss the behaviour of the scaling prop- 
erties of the second order moment (y(t,T)^) ~ r''^^^. We 
may anticipate that the statistical properties of this scal- 
ing exponent C(2) follow that of f3; indeed there exist 
many results for a range of different estimators of the 
C(2) 0, Hi that directly show the asymptotic ^ 1/N 
behaviour discussed above. In practice, the convergence 
to this ^ behaviour will depend on the details of the 
process and the estimator and, as we shall show in this 
paper, is often slow. 

An essential tool in the analysis of 'real world' time se- 
ries in the context of scaling is then a prescription for the 
variance in the scaling exponents of x{t) as a function of 
the number of observations N in the chosen interval. In 
this paper we make some first steps in this direction by 
obtaining empirical estimates from the study of a vari- 
ety of stochastic processes that have been used as models 
for physical systems. We focus on finite size N realiza- 
tions of self-affine cases with Gaussian distributed incre- 
ments in the form of a standard Brownian motion and 
fractional Brownian motion (fBm); and those with heavy 
tails, namely a-stable Levy motion and linear fractional 
stable motion (LFSM) [H HI, [111 . A representative case 
for multifractal scaling is provided by the p- mo del, often 
used to characterize observations of turbulence [1^ [2^ . 

The fundamental property of ergodicity in systems 
that exhibit scaling implies time stationarity. In its 
strong sense time stationarity implies that the proba- 
bility density function (PDF) of x{t) does not change 
with time; this is known as strict stationarity. Pragmat- 
ically, weak stationarity, that is time independence of 
the variance or second order moment is usually adopted 
- the latter convenience is usually assumed due to the 
special place that the Central Limit Theorem and the 
Gaussian distribution hold in statistics. In this paper we 
are concerned specifically with the behaviour of scaling 
exponents which are characterized through the statisti- 
cal properties of the increments y(t, r), rather than the 
time series x{t); hence we will use as our test time se- 
ries examples that have stationarity in y{t, r), and not in 
x{t). 



We will focus on the statistics of the scaling exponent 
of the second order moment of the increments, as this 
also captures the power spectral exponent f], and for self- 
affine finite- variance processes the Hurst exponent H (see 
next section and also [13] for the infinite variance case). 
We will study these processes for a range of values of N 
that are feasable for realisable physical systems; and find 
that in particularly for the heavy-tailed processes, the 
variance in the exponent with A'^ shows a significant de- 
parture from the 1/A^ asymptotic behaviour. Neverthe- 
less, for these heavy-tailed processes, we find empirical 
evidence of an intermediate range of scaling with N~'^ . 
We will estimate the time series interval A^ required to 
capture the scaling exponent to reasonable precision; this 
places a lower limit on the sample size. A related study 
to this was conducted to investigate and compare the 
effects of finite sample size on different statistical esti- 
mators for the Hurst exponent H for a Gaussian white 
noise process (28| . Stationarity also implies a particu- 
lar PDF of the values of the exponent obtained from 
many, length A^, realisations of a given process. This is 
known asymptotically for N ^ oo for the processes based 
on Gaussian increments (generalizable to finite variance 
processes) and is also known in this asymptotic sense 
for infinite variance processes; both processes approach- 
ing a Gaussian distribution for the scaling exponents as 
A^ ^ oo [H, [13, [10, [13, Hi, 113] (using least squares and 
maximum likelihood estimation schemes). For the inter- 
mediate stage of finite N we find intermediate distribu- 
tions for the exponents; resembling both the asymptotic 
Gaussian forms and, for heavy-tailed data, Gumbel max- 
stable (Extreme value type I) distributions. Comparing 
these results with that found for real-world time series 
may provide an additional test for stationarity in the in- 
crements. In this spirit we finally illustrate these ideas 
with some examples of real- world time series in the form 
of in-situ observations of magnetic field and velocity in 
the turbulent solar wind using data from spacecraft at 
lAU in the ecliptic; and comment on the statistical prop- 
erties of their scaling exponents in light of the represen- 
tative synthetic toy models. 

II. METHODOLOGY 

We will focus attention on the scahng exponent C(2) of 
the second order moment of the increments also known 
as the second order structure function: 

{y{t, rf) = ({x{t + r) - x{t)f) = {y{t, if) r^f^) , 

where we have assumed that the increment process is at 
least second order stationary i.e. (y(t, t)^) = (?/(r)^) 
(weak-stationarity). In particular, this implies that the 
power spectral density of a discrete time random walk 
x(t) of i.i.d. stationary increments with finite variance, 
scales as [ll] 

F{cj) ^ ^ (2) 
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where the scahng exponent C(2) is related to the power 
spectral exponent (3 otx{t) by C(2) = f3—l. For self-affine 
process with Hurst exponent H the PDF P{y, r) of the 
increments obeys the scaling relation (for the case of a- 
stable processes with finite N see [27|] , and the discussion 
to follow) 

P{y,r)^r-"V%r-"y) , (3) 

where the PDF P at any scale r can be collapsed onto 
a unique scaling function V. The scaling relation Q 
implies that the scaling of the structure functions to all 
orders p [131 is given by {y{TY) — {y{\Y) t^^p^ where 
C(p) = Hp] and thus we have that C(2) = 2H . Our 
results concerning the statistical behaviour of C(2) with 
N will thus also apply to the power spectral exponent 
(3 for all the models concerned and the Hurst exponent 
H for the self-affine models; both are commonly used 
to characterize statistical scaling. Our remarks can also 
be generalized to the scaling exponents C(p) of structure 
functions of higher-order positive moments. These are 
relevant for multifractal processes where the C,{p) are a 
nonlinear function of p and so 7J or /3 are not sufficient to 
determine the complete statistical scaling of the y{t, t). 

Our study consists of partitioning a given time series 
x{t) into L equal intervals of sample size N denoted by 
Xi{t) where i ~ 1 . . . L. Each of these intervals are then 
differenced on scale r to produce a time series of the 
increments yi{t,T) = Xi{t + T) — Xi{t) of the process Xi{t). 

We will look for scaling of the second order moment 
(structure function) 

Mfir) = (y.(r)2) = T y^ P,iy,,T)dy, , (4) 

with r such that Mf{T) = Mf(l)r^'(2). Again, the in- 
dex i indicates the i*'' interval over which the exponents 
are calculated and tracks any (real or statistical) time 
variation in the value of Ci(2)- In an infinitely large in- 
terval, N oo, the limits of the integral yf^ —^ ±oo; here 
however each i*'* interval of the time series will impose 
different finite extremal values y^. For the heavy-tailed 
processes in particular, the statistics of the yf^ can be 
anticipated to have a significant effect on the statistics of 
the (i{2); this has been discussed for the case of a- sta- 
ble Levy processes in These Levy processes, possess 
heavy tails in the PDFs of their increments, with tails 
that fall as P{y) ~ power-laws. The a-stable 

Levy processes have divergent moments for p — 2 and 
above; for a finite sized sample the moments exist but can 
be dominated by the behaviour of rare outlying points in 
the tails which introduce a pathological bias when es- 
timating scaling exponents from the moments (27| (for 
a wider discussion see [31]). We circumvent these diffi- 
culties, at least for self-affine time series, by restricting 
our analysis to the scaling of the second order moment 
C(2), and by using the iterative conditioning technique 
[27|. This simple and robust technique for exponent es- 
timation removes a small percentage of the extreme data 



values which are poorly sampled statistically. In some 
pathological cases such as the a-stable Levy distributions 
these rare extreme points are of the order of and some- 
times larger than the whole sum [33|. Because they are 
so large they tend to dominate the statistics and thus the 
scaling of the higher order moments. This can be clearly 
seen if we look at the discrete definition of the moments 
of order p 

^ ^ 

^n-) = ]v^^A- ■ (5) 

The reasoning and full illustration of this iterative condi- 
tioning method to heavy-tailed non-Gaussian distribu- 
tions is discussed in [23] ■ Although not discussed in 
this paper, the iterative conditioning technique is also 
an unbiased robust technique for distinguishing self-affine 
(monofractal) from multifractal behaviour. 

We will focus here on parameter stationarity as op- 
posed to trend stationarity. The former refers to the 
change in the intrinsic dynamics of the process of interest 
as characterized by its quantitative statistical properties 
(the behaviour of the moments); as opposed to the latter 
which is simply an additive trend to the signal. In partic- 
ular we will focus on the stationarity of the scaling of the 
moments as captured by the exponents Ciip)- If secular 
trends are present in the time series then the time series 
of increments will be approximately trend-free provided 
our differencing scale r is sufficiently small [s^. A sec- 
ular trend can also be removed by detrending or by the 
method of studying the scaling of moments of wavelet 
coefficients where an appropriate wavelet is chosen with 
a large number of zero-moments [lol [3^ . The more com- 
plex case of mixed dynamics i.e. two or more intrinsically 
different processes represented in different sections of a 
time series will not be considered here. 



A. Data generation and sources 

We will consider synthetically generated signals that 
are both stationary and nonstationary with respect to 
their increments. The signals with stationary indepen- 
dent increments will consist of a standard Brownian mo- 
tion and standard symmetric a-stable Levy motion for 
four values of the exponent a [13, [1^ ; the latter being 
highly non-Gaussian and heavy-tailed with very large ex- 
cursions in their time series. To survey a broad range 
of such processes we have also included non-Markovian 
versions of the above processes. These include a long- 
memory fractional Brownian motion (fBm), and a long- 
memory persistent and anti-persistent linear fractional 
stable motion (LFSM) - see [tZ, 22, 23, 24jfor more 
details on these processes and in particular [2J] for the 
algorithm and MATLAB code for the LFSM. 

We also investigate a multifractal time series generated 
from a discrete multiplicative cascade process in the form 
of the p- model [HI, [26(] . The p- model is used as a model 
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for intermittent turbulence [l|, [s^. The intermittency 
of the p-model time series leads to non-time stationary 
finite N moments; however the set of scaling exponents 
Ciip) are stationary. 

The nonstationary time series we will consider are a 
standard Brownian motion with linearly varying stan- 
dard deviation of its increments with time cr ~ i, and 
cyclically varying standard deviation (cyclically station- 
ary) (T ~ sin'^{t). All of the above synthetic time series 
were generated in MATLAB with appropriate random 
seeding and sample sizes of iV ~ 10^. 

Lastly, we will consider three real-world time series 
which have been found to exhibit scaling [s^, [H, [H, [13] ■ 
These consist of two time series of 100 second resolu- 
tion magnetic field and speed v from the NASA 
WIND spacecraft at lAU in the solar minimum year 
1996; and a 64 second resolution one year long time 
series of the magnetic field energy density from 
the NASA ACE spacecraft in the solar maximum year 
2000. All of these time series consist of ~ 5 x 10^ 
data samples and can be downloaded from CDA web 
http : //cdaweb . gsf c .nasa. gov/. 



III. RESULTS 

We study the variation of the scaling exponent of the 
second order moment Ci{2) with sample size N. The 
process by which the exponent Ci(2) is estimated for L 
contiguous intervals of N points of a time series is illus- 
trated in Figure [T] for the p-model. We begin with the 
time series in Figure [TJa) which we parse into L inter- 
vals. For each of these intervals we obtain an estimate of 
Ci{2) as the gradient of a linear least squares regression 
to a log-log plot of the second-order moment Mf (r) ver- 
sus the scale or differencing parameter r. This method 
of obtaining the scaling exponents is also known as the 
structure function technique [l|, and is closely related 
to variance plot, correlogram and log-periodogram tech- 
niques [13, [sqI - in the latter reference [s^] it is identical 
to the variogram technique. We focus on this particu- 
lar method to estimate Ci(2) as it provides a point of 
contact with asymptotic N oo estimates of the vari- 
ance of the power spectral exponent P which are based 
on linear regression over a finite range power law power 
spectrum (Ty, [13] • In both cases, the variance in the 
estimated exponent will depend upon the details of the 
linear regression. For the second order moment these de- 
tails include the range of values of r over which M^{t) 
is a power law; the number of different r for which we 
calculate (r) and use in the linear regression; and the 
uncertainty of each Mf{T) value. In all cases considered 
here we optimize these details to minimize the linear re- 
gression error but importantly use the same algorithm 
for all of the sample time series that we discuss. 

The linear fit is obtained by linear least-squares regres- 
sion which also provides an estimate of the error. We 
augment this estimate of the error by varying the start 



and end points of the regression by a few points and ob- 
taining the difference in the exponents. The linear regres- 
sion was done over ^ 20 values of the scale parameter r, 
where r was increased geometrically as r = hase^ , where 
fc € {0, • • • ,40} and base was chosen to be 1.2. The fit 
was done over this reduced set of measurements at ^ 20 
values of r so that a fair comparison can be made with 
the real-world data (to be discussed later) where only a 
limited power-law range is seen. 

Due to its highly intermittent nature the p = 0.6 p- 
model is not time stationary in its finite N moments and 
this can be seen in Figure [T](b) where we plot consecutive 
values of the second order moment M^{t) obtained for 
each of the L intervals of N points, shown for r = 1 and 
two values of N . For the p-model time series shown here, 
the second order moment follows the local amplitude of 
fiuctuations in the time series itself; comparing the ratio 
of the amplitude of these fluctuations to the signal am- 
plitude is one of the classical 'first base' techniques for 
establishing whether the signal is stationary (in the weak 
sense) [3^. As one would expect from ([5]), this variation 
of the second-order moment M^(t) with the amplitude 
of the time series is emphasized as we decrease N as any 
estimates of the statistics from smaller sample size will 
naturally mimic the more local features of the time series. 
This behaviour is more pronounced in very intermittent 
signals i.e. those with heavy-tailed fluctuation PDFs. 

We also plot in Figures [2c) and (d) the correspond- 
ing estimates of Ci(2) for each interval. These two panels 
show the same data, that is, the estimates of Ci(2) plot- 
ted without (c) and with (d) error bars obtained from 
the linear regression and the error augmentation outlined 
above. As intuitively expected, if we decrease the sample 
size N over which the 0(2) are computed, the scatter 
increases. However unlike the moments, there is no clear 
trend with the amplitude of the signal, indicating sta- 
tionarity of the scaling exponent Ci(2)- This latter phe- 
nomenon will also be encountered in the non-stationary 
Brownian time series we will study. The estimates of 
Ci(2) can be seen to vary by up to a factor of two for 
N — 10^ for this realization of the p-model. This under- 
lies the difficulty of obtaining physically meaningful es- 
timates of scaling exponents for physically realizable N . 
We can see that the error bars approximately capture the 
fluctuations in the estimates of Ci(2) for the case of the 
p-model. As we wish to include strongly non-Gaussian 
processes in our study, we will henceforth present numer- 
ical estimates of the variance of Ci(2) obtained directly 
from computing many values of Ci(2) rather than the lin- 
ear regression error. 

The essential point is that quite signiflcant variation 
in the scaling exponents can arise in time stationary, but 
intermittent, time series; even when these are estimated 
over intervals of data that might intuitively be considered 
to be of adequate length. In order to distinguish variation 
in the scaling exponents that is statistical (finite N effect) 
as shown above, from that which refiects intrinsic non- 
time stationarity, some estimate of the N dependence 
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Figure 2: (Color online) a.) The variance of conditioned ^(2) 
with sample size TV for all the synthetic finite variance pro- 
cesses studied shown on a log-log plot, b.) same as in a.) 
for all the synthetic infinite variance processes studied. The 
diagonal dashed line on both these plots indicates a negative 
slope of unit gradient so that comparison with theoretically 
expected asymptotic behaviour can be made. The vertical 
black dashed lines indicate the areas outside of which errors 
begin to dominate due to i.) (bottom vertical line) lack of 
values of C»(2) to make a decent estimate of Var {({2)); and 
ii.) (top vertical line) failure of the iterative conditioning 
technique to obtain unbiased estimates of C(2). 



Figure 1; (Color online) a.) Time series of length A'' = 10^ 
for the p-model (p — 0.6). b.) Variation of the second order 
moment of the increments yi{t, r) for time-scale r = 1 of the 
above time series where the original time series has been par- 
titioned into L = 100 and L = 10 equal sized intervals, c.) 
Variation of conditioned 0(2) with time for the p-model with 
the same segmentation as in b.) - also shown are the mean 
values of the exponents for different partitioning correspond- 
ing to different sample sizes; d.) Same as (c.) but with errors 
explicitly shown. 



of the variance of ({2) is needed; this will also point to 
an estimate of the minimum number of observations N 
needed to obtain a 'reasonably accurate' estimate of C(2). 
We will now explore the variance of C(2) as a function of 
N. 

In the limit N oo, (3, when estimated via a log- 
periodogram varies as yar[/3] ~ [l6l. [l7l|. This lim- 
iting behaviour is also known for some other estimation 
schemes of the self-similarity parameter (as we dis- 
cuss later here) . Thus we would anticipate that for suffi- 
ciently large N, Var[Q{2)] ^ 1/N for our moment scaling 
estimation also. However, we do not know the rate of con- 
vergence with N to this limiting behaviour and can also 
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anticipate that this will depend upon whether the PDF 
of the increments is heavy-tailed, and whether or not the 
increments are dependent - both of which introduce fur- 
ther difficulties in obtaining an unbiased estimator. 

In Figure [2] we plot the variance of Q{2) against the 
sample size N on log-log axes, for a range of N that are 
feasable in realistic realizations of physical systems. Fig- 
ure dja) shows the behaviour of a subset of our synthetic 
time series that are intrinsically finite variance processes; 
Figure [Hb) shows all the synthetic time series from infi- 
nite variance processes that we consider. Plotting these 
on log-log axes reveals a characteristic power law trend 
for all the processes: 



Var[C{2)] ^ CN-"' 



(6) 



We see that indeed, 7 ~ 1 for the intrinsically finite vari- 
ance processes. More pragmatically, we can use this plot 
to make an estimate of the minimum sample size Nmin 
needed in order to estimate ^(2) such that the error in- 
troduced from the small sample size N = Nmin is, say, 



5%. We propose a simple criterion 



C(2)|l=i 



< 0.05 



(7) 



where C(2)|l=i is the value of ^(2) estimated for the en- 
tire time series (assuming that the scaling is stationary). 
This leads to 



Var[(:{2)] < (0.05C(2)|l=i)^ 



(8) 



where the value of Nmin is extrapolated from the plot 
of yar[C(2)] Vs. N, from Figure [2^a). For these finite 
variance processes expressions ^ and ([8]) yield Nmin ~ 
10'^ for the fBm model; Nmin ^ 10'* for the standard 
Brownian motion (stationary and non-stationary); and 
Nmin ~ 10^ for the the p-model. 

One can invert these relationships to obtain the ap- 
proximate error on Ci(2) given a sample size N from 
which it was calculated. The constant C in (l6|) is also 
intrinsic to our estimate of Nmin', operationally the pro- 
cedure for obtaining the error on Ci(2) in this manner 
would also include estimating C from the plot in Figure 
Ha). 

Processes that show scaling often have increments 
drawn from a heavy tailed PDF, these may also not 
intrinsically have finite variance as is the case for the 
a-stable Levy processes. Figure ^h) shows the TV de- 
pendence of all the infinite variance synthetic time series 
that we have considered, including those with long-range 
memory. The curves are all generated from time series 
which possess heavy-tailed PDFs for their increments. 
These include both the ordinary and fractional Levy in- 
crements. The curves in Figure (Ub) have a range of 7 
values close to but also clearly distinct from 7 = 1. As 
will be discussed later this is due to slow convergence to 
the asymptotic A'^"^ behaviour; from Figure [2Kb) we can 
see that the Levy process which is closest to Gaussian, 



namely with a = 1.8, has behaviour closest to 7 ~ 1. In a 
similar way to the method used above for the finite vari- 
ance synthetic processes, we make empirical estimates of 
Nmin required to obtain an estimate of C(2) to within 
~ 5% for the infinite variance processes. For the Levy 
processes a = 1.2 and a = 1.4, and LFSM {H = 0.44, 
a = 1.4) Nm^n ^ 10^; for the a = 1.6 case TV„„ - 10*; 
and for the a = 1.8 case and LFSM {H = 0.9, a = 1.6) 
Nmin ~ 10'^. The relevant property in the context of es- 
timating the uncertainty on Q(2) is that for realizable TV, 
these processes do not show an N"^ dependence. Also, 
unlike the Gaussian processes in Figure[2Ka) which cluster 
around a similar C value, the infinite variance processes 
have noticeably different values of C which depends on 
both the the tail exponent a and also on the degree of 
memory in the process given hy H — (l/a) [20| . 
Finally, combining equations ^ and ([5]) we obtain 



Nm^n=C'/'^ (0.05C(2) |i = l) 



-2/7 



(9) 



where both C and 7 depend on the process in question; 
and for finite variance processes 7 = 1. 



Error analysis 

To estimate the errors in the estimates of Var{({2)) 
a small monte-carlo type study was performed in which 
different random seeds were used to generate 10 different 
realisations of the two archetypal processes studied here 
i.e. finite and infinite variance processes in the form of 
10 different realisations of a standard Brownian motion 
and a standard a-stable Levy process {a ~ 1.4). The 
computation of the logyar(C(2)) Vs. logA^ plots were 
then calculated for each of these realisations; these are 
shown in Figures ^a) and (b). We then average over 
these realisations to obtain an average value of Var{(^{2)) 
for each A^, shown on log-log axes in Figures[3Kc) and (d); 
the ensemble of realisations also provides an error on this 
value via the maximum deviation from this average. 

At large A^ errors are dominated by there being fewer 
values of computed 0(2) and at small A^, by poor res- 
olution of the PDF from which we ultimately compute 
Q{2). In particular, at small A^ we can see from the plots 
for the a-stable processes that there is a systematic de- 
viation from power law behaviour in A^. This arises from 
a breakdown in the iterative conditioning technique (27j 
at small A^. 

In the next section we will discuss the PDFs of the 
scaling exponents Ci(2) obtained from this study. When 
these are close to Gaussian, standard Chi-squared distri- 
butions and F-test techniques could provide methods of 
obtaining errors for values of yar[C(2)], even from a sin- 
gle realisation. In this context we should mention the use 
of bootstrap re-sampling methods for providing distribu- 
tions, confidence intervals and statistical significance for 
parameter estimates in situations when one is limited by 
a single realisation [4l|, Although the convergence 
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Figure 3: (Color online) Plots showing how errors can be ascribed to the plots in Figure [2] The top plots show the results of 
the study for 10 different randomly seeded realisations of sample size A'^ — 10*' for a.) a standard Brownian motion and b.) an 
a-stable Levy motion (a=1.4). Plots c.) and d.) are the mean averages of the realisations in a.) and b.) respectively, where 
the error bars are calculated from the maximum deviation from this mean in the 10 realisations. 



and consistent properties of such techniques in the case 
of heavy-tailed distributions [13, 0, and especially 
infinite-variance processes are still unclear we envisage 
the use of such methods in future research. 

Finally, one could in principle increase the available 
number of values of 0(2) by overlapping intervals of size 
N within a given single realisation. We have, however, 
found that this introduces a significant systematic bias 
in the computed values of Far [^(2)]. 



Real-world data 

We calculate Var[C(2)] values for the examples of real- 
world data sets discussed earlier in the introduction. The 
plot detailing this study is shown in Figure HI For com- 
parison we have also included on this plot the variation 
of Far[C(2)] with N for the two archetypal cases of finite 
and infinite variance processes in the form of a standard 
Brownian motion and an a-stable Levy process (a = 1-4); 
we also plot a negative unit slope for the asymptotic 
N ~* oo behaviour obtained from large sample theory, 
this is indicated by the dashed line. Figure S] shows that 
the real-world data can show significant departures from 
the synthetic data. 

The WIND data illustrates the effect of large data gaps 



which are not present in the ACE data shown; this lim- 
its the amount of data available for certain N which is 
reflected in the corresponding estimations of Far [^(2)]. 
For the ACE data we can see a clear systematic de- 
parture from the synthetic models. We will discuss this 
latter data set in the next section below. 

The problem with a single length N realisation is that 
we cannot calculate the errors on V^ar[C(2)] as done in 
the previous section; and thus have no way of gauging 
how close these graphs are to the expected asymptotic 
behaviour predicted by large-sample theory. However, 
one can still estimate an error for measurements of C.{2) 
obtained from a finite data size N, in the same way as 
was done in equations ([Zl)-®. For example, in the case 
of the ACE B'^ data this would indicate that a iV ~ 10^ 
sample size would introduce an error of ^ 12% in the 
estimated values of C(2) using the iteratively conditioned 
moment scaling technique. 



A. Underlying statistics of (^(2) 

We plot in Figure [5] the PDFs H {({2)) for three of 
the representative models we have studied along with the 
PDFs H (C(2)) for one of the real-world data sets. For 
each of these time series, PDFs have been constructed 
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Figure 4: (Color online) Plot of the variance of conditioned 
("(2) with sample size A*' for all the real-world data sets studied 
shown on a log-log plot. The dashed line on both of these plots 
indicates a negative slope of unit gradient. Also included for 
comparison are the archetypal synthetic data sets for the finite 
variance and infinite variance processes. 



for two different sample sizes N. We see that apart from 
the Qf-stable case, these PDFs are well described by a 
Gaussian distribution, as can be seen by the Maximum 
Likelihood fits. The a-stable Levy case is shown in Fig- 
ures [SId i.) and b ii.) to be well described by a Gum- 
bel max-stable Extreme Value distribution [46|. To see 
why nearly all of our PDFs corresponding to finite vari- 
ance processes are close to Gaussian we appeal to large 
sample-theory. 

To facilitate understanding we employ more heuristic 
arguments at the expense of mathematical rigour. Inter- 
ested readers can find more on the mathematical details 
and proofs in [l^, |43| which deal with spectral param- 
eter estimates of stron g lo ng-range dependent Gaussian 
stationary time series; [29j for a non-stationary general- 
ization of these; [2l| for a pseudo-variogram estimator 
(similar to the method in this paper) to long-range de- 
pendent Gaussian processes with stationary increments; 
and the more recent and extensive paper by Stoev, Pipi- 
ras and Taqqu [20j which extends the proofs and argu- 
ments of [21[ to infinite variance processes in the form of 
a-stable and linear fractional stable processes. This lat- 
ter reference will be our main source and point of contact 
for what follows. A survey of many of these papers and 
parameter estimation techniques can be found in [ITII . 

As mentioned above, we have chosen Stoev et. al. |20 | 
as a point of contact from amongst the extensive liter- 
ature concerning asymptotic large sample behaviour of 
parameter estimates. This is primarily because this work 
has dealt with infinite variance processes of the type dis- 
cussed here; also our moment scaling technique is a par- 
ticular form of one of the main estimators used in (20j 
(see also (U). We have also used Stoev's MATLAB 
algorithm for generating the LFSM realisations used in 
this paper. Similar to our study Stoev et. al. use a 



least squares regression on the moments which they refer 
to as the 'power' estimator. However, instead of tak- 
ing moments of the increments as we do, they take the 
moments of coefficients for a Finite Impulse Response 
Transformation (FIRT) of the discrete time-series, which 
is characterised by a discrete filter of n members. Our 
increments are one of the simplest forms of these FIRT 
coefficients if we take the filter to be comprised of a set 
of n = 2 members {—1, 1}. However, any extra benefits 
of having more than one zero-moment (moments which 
are equal to zero) will be lost due to this simplicity. This 
also applies to the wavelet coefficients used in the study 
of Stoev et. al. where our increments result from tak- 
ing the mother wavelet to be the superposition of two 
delta functions (one positive, one negative) separated by 
a scale r - also known in the literature as the 'poor man's 
wavelet' [1^. Also, an important fact to note when com- 
paring the methods of Stoev et. al. to ours is that we 
differ with the 'power' method of these authors by using 
the iterative conditioning technique which by censoring 
and excluding the poorly sampled large extreme events, 
correct for the bias which is pathological in the case of 
heavy-tailed distributions [23| . 

Recall that the second order moment is scaling as 
M2(r) = M2(1)t^(2) and we will be estimating C(2) from 
the gradient of a log-log plot 



logMf(T) =C(2)logT-HlogM2(l) 



(10) 



For the discrete data the gradient can be estimated via 
least squares linear regression, and the problem can be 
set out as 



where 



N 



logMf(Ti) 



(11) 



(12) 



log M2(rfc) 

is the vector of the observations (or dependent variables); 

logTi 1\ 

; : , (13) 

logTfc 1 / 

contains the vector of the scales (or independent vari- 
ables) ; 



Z = 



C(2) 

log A/2 (1); ' 



(14) 



is the vector of the parameters needed to be estimated; 
and 



/ 



7v(logAf2(ri)-logM2(Ti)) \ 



(\0gMf{Tk) - \0gMf(Tk) 



(15) 
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Figure 5: (Color online) PDF's _H"(("(2)) obtained for (a) a standard Brownian motion, (b) Q-stable Levy process (a = 1.4), (c) 
p-model (p=0.6) and (d) ACE 2000 for two different sample sizes (i) N = 1000 (iV = 500 for ACE) and (ii) A*' = 10, 000 
{N — 5000 for ACE). The sample PDF's are overlaid with Maximum Likelihood Estimate (MLE) model fits of a Normal 
distribution for a, c and d; and Gumbel max-stable (extreme value type 1) fits for b. For both these models jj, is the location 
parameter and a is the scale parameter, which for the Normal distribution coincide with the mean and standard deviation. 
The samples of (i(2) (varying with time t) from which these PDFs are constructed, are shown in the corresponding inserts of 
each plot. 



10 



is the vector of estimation errors between the sample 
measurements and those of the true expected values (de- 
noted by M) for which the scaling relation in (fTO| actu- 
ally holds. The solution to equation pT|) is then given by 
the well known ordinary linear least squares estimator to 
the parameters as 



Z — (T^ogTiog 



log 



(16) 



where superscript t represents a matrix transpose. ([TJ 
is simply a linear combination of the dependent variable 
log Mf{T) i.e. a sum, so that for Ci(2) one can write the 
ordinary least squares estimate as 



C.(2) = ^ a, (log M2(t,)) 

i=i 



(17) 



where the aj are all the elements of the appropriate vector 

from (TfogTiog) ^ T^^g- We wiU return to this form of 
the ordinary least squares solution shortly. 

Adapted to the notation used in this paper, Theo- 
rem 3.1 in states that if ^(2) is the FIRT coefhcient 
estimator (using least squares regression) for the scal- 
ing exponent and (^(2) the true expected value; then as 
lim N oo 



(18) 



C(2)-C(2) -AA(0,a2) 



where A/" (O, cr^) is a Normal distribution with mean 
and variance a^. Strictly speaking this theorem requires 
that the FIRT coefficients obey the following inequality 
between the number of zero-moments of the FIRT filter, 
the self-similarity parameter H and the tail (stability) 
index a 



Q>H + 



1 



a (a — 1) 



(19) 



which in the case of the ordinary Brownian motion and 
the a-stable Levy processes, where H = 1/a, generalises 
to 



Q > 



1 



(a-l) 



(20) 



The moment scaling scheme based on the raw increments 
has only one zero moment, hence Q — 1; and as a result, 
except for the p-model for which a and H are unknown 
(or not applicable), the above inequality does not hold 
for any of the synthetic models. However as mentioned 
in [2l[ for Gaussian processes, where in equation (fT9)) 
a = 2, the Q = 1 case is sufficient for the above theo- 
rem to hold as long as H < 0.75. For our fBm model 
H — 0.8 and we find that the results of the theorem (fTB]) 
still hold. Thus we can conjecture that the criterion in 
(fT9)) and (|20p can be relaxed a little so that the inequality 
becomes an approximate inequality. Also, Stoev et. al. 
12 Ol show via simulations that the estimators continue to 



work well even when the criterion in ([T9|) and pO|) are 
not satisfied. The essential reason why this criteria was 
initially introduced, was so that the estimator could dis- 
tinguish between actual long-memory effects and trends 

s 

We now consider the exception that we have found to 
this behaviour - that of the infinite variance processes at 
finite A^. We would expect that in the N ^ oo limit the 
results of the above theorem will also hold for the infinite 
variance processes. However, we believe that in this case 
the convergence will be slow and will depend upon the 
number of scales t that were used to conduct our linear 
least squares regression. 

We will now go beyond the above asymptotic argu- 
ments to discuss the non-Gaussian intermediate finite A^ 
behaviour that we see here in Figure O (b-i) and (b-ii) . 
Recall the expression for Aff (r) given in equation ([5]) 
(for p — 2). We will discuss in detail here the case where 
the sum in ([5]) consists of i.i.d. random variables; this 
is the case for some of our synthetic time series - these 
arguments can be developed for other cases. The PDF 
of this sum will by the Central Limit Theorem tend to a 
Gaussian and for finite N will probably take the form of 
a Pearson xt type variable with i/ degrees of freedom (see 
[2H for more details). However, for an infinite variance 
process, the sum in (O will be dominated by the largest 
extreme events, which in some cases can be of the order 
of the rest of the sum [s^l • This will still be the case even 
when we have excluded some of these extreme events due 
to the iterative conditioning scheme. Thus the sum will 
be distributed as the extreme values of an a-stable Levy 
distribution - which is given by a max-stable Frechet dis- 
tribution (see [131 )• Note that this will be the case for 
any A^, even in the asymptotic large A^ case. Without 
too much detail the form of the PDF P (Mf) of Mf will 
be of the type 



P(Mf) = 



A 



2(M2)i+"/2 



exp 



A 



a{Mf) 



1/2 



(21) 



where any scale parameters have been absorbed into 
the A. One can then convert this Frechet PDF to a 
PDF P (^Mf^^^ corresponding to the dependent vari- 
able log Mf (r) in the least squares scheme in (|17p , which 
under a simple conservation of probability will be given 
by 



P M. 



,r2 



A 

2 exp 



i.log 



-exp(--M5„, 



(22) 

which is in the form of a Gumbel extreme value distri- 
bution; this is another max-stable distribution [io^. The 
Gumbel max-stable PDF has a long slow exponentially 
decreasing right tail; this will imply that a sum of ran- 
dom variables such as p?)) . each distributed with this 
PDF, will eventually tend to a Gaussian distributed ran- 
dom variable, but slowly due to the heavy right tail. This 
then captures our result in Figures [H] (b-i) and (b-ii), and 
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may also explain why we do not obtain the be- 
haviour in the plots of logVar{({2)) Vs. logA^ for the 
a-stable Levy cases. 

As discussed above, for the case of the finite variance 
synthetic time series the Mf will be well described by a 
Gaussian or (more realistically for finite N) a Pearson 
PDF. In the same way as was done with the infinite 
variance processes above, it can be shown that the PDF 
of logMf can be written as a Gumbel min-stable PDF. 
Gumbel min-stable PDF's have long slow exponentially 
decreasing left tails, which in our case will be limited by 
the fact that C(2) > 0. The right tail of Gumbel min- 
stable PDFs is more compact with a rapidly decaying 
exponential tail. Due to the more compact nature of the 
PDF, a sum of logMf variables will tend to a Gaussian 
under the Central Limit Theorem much faster than the 
infinite variance processes above, hence explaining why 
the PDFs H {C{2)) for the finite variance processes are 
well described by a Gaussian. 

Finally, there is the open question of the behaviour of 
the real-world data. The ACE data PDFs of H (C(2)) 
show that they can be well described by a Gaussian; how- 
ever the scaling of Var{(^{2)) with N using our estimation 
shows a significant deviation from the behaviour im- 
plied by (|T8|) . This will be the topic of future work. 

IV. CONCLUSIONS 

We have investigated finite-sample size {N) effects on 
quantifying the statistical scaling properties of time se- 
ries. We focus on the scaling exponent C(2) of the vari- 
ance or second moment which for a wide class of pro- 
cesses also gives the spectral exponent /3 of the (power 
law) power spectrum. If too small a sample size is used 
then these fluctuations effectively introduce a pseudo- 
nonstationarity in the estimates for the scaling expo- 
nents. To achieve an error in the exponent of less than 
~ 5%, we find that the number of data points N needed 



varies significantly with the details of the underlying pro- 
cess and is in the range of 10^ — 10^ for the synthetic 
models used in this paper. The variance in the exponent 
when computed from an interval of N samples is known 
to vary as ~ 1/iV for N —^ oo; however, the conver- 
gence to this behaviour will also depend on the details 
of the process and more importantly on the parameter 
estimating technique used. In particular we have shown 
that heavy tailed processes can be far from this limiting 
behaviour for observationally realisable N. 

We have also considered the case where the scaling 
exponents are time independent, but where there is a 
secular time dependence in other parameters such as the 
standard deviation. For the case of a Brownian motion, 
the estimate of the scaling exponent is not affected by 
this time dependence. It may thus be too premature to 
reject a time series for scaling analysis simply because 
of the non-stationarity of certain parameters i.e. a run- 
ning mean or standard deviation. This also highlights the 
need to distinguish time variation in the moments from 
that in scaling exponents that are derived from them. 

We have focussed here on the moment order scaling 
technique to calculate the scaling exponents in order to 
highlight the issue of apparent non-time stationarity. Al- 
though there exists extensive statistics literature on the 
asymptotic N oo limit of various estimation tech- 
niques, further work is needed to investigate how these 
details pass over to the more pressing and pragmatic need 
for their implications on quantifying scaling in finite and 
realisable TV-sized samples. 
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